Adaptation and Cryptic Pseudogenization in Penguin Toll-Like Receptors

Abstract Penguins (Sphenisciformes) are an iconic order of flightless, diving seabirds distributed across a large latitudinal range in the Southern Hemisphere. The extensive area over which penguins are endemic is likely to have fostered variation in pathogen pressure, which in turn will have imposed differential selective pressures on the penguin immune system. At the front line of pathogen detection and response, the Toll-like receptors (TLRs) provide insight into host evolution in the face of microbial challenge. TLRs respond to conserved pathogen-associated molecular patterns and are frequently found to be under positive selection, despite retaining specificity for defined agonist classes. We undertook a comparative immunogenetics analysis of TLRs for all penguin species and found evidence of adaptive evolution that was largely restricted to the cell surface-expressed TLRs, with evidence of positive selection at, or near, key agonist-binding sites in TLR1B, TLR4, and TLR5. Intriguingly, TLR15, which is activated by fungal products, appeared to have been pseudogenized multiple times in the Eudyptes spp., but a full-length form was present as a rare haplotype at the population level. However, in vitro analysis revealed that even the full-length form of Eudyptes TLR15 was nonfunctional, indicating an ancestral cryptic pseudogenization prior to its eventual disruption multiple times in the Eudyptes lineage. This unusual pseudogenization event could provide an insight into immune adaptation to fungal pathogens such as Aspergillus, which is responsible for significant mortality in wild and captive bird populations.


Background
One explanation for the pseudogenization of TLR15 is that the gene underwent duplication, and one copy experienced functional redundancy and degradation in the genome. Even though this scenario does not fit well with TLR15 (there were several homozygous pseudogene haplotypes and we did not observe any triallelic sites at the locus), another method of detecting duplication is through analysis of read coverage of the locus.

Methods
BAM files from analysis conducted in Pan, et al. (2019) were obtained for three penguin species (one non-Eudyptes spp. and two Eudyptes spp.: Spheniscus humboldti, Eudyptes robustus, Eudyptes chrysocome. Using coordinates derived from known coding sequences from Pan, et al. (2019), average depth of coverage was calculated for each region using samtools depth and awk. Using BLAST, the genomic position of the TLR15 locus was determined for each species and average coverage was determined for this region. Finally, average depth of coverage was plotted in a pairwise fashion between pairs of the three genomes. Figure S2: Pairwise depth of coverage of TLR15 and all other known coding sequences in the genomes of three penguin species. Depth of coverage for all coding loci are plotted (small points) and coloured according to their density. Depth of coverage for TLR15 is overlaid (red point).

Results
In each pairwise comparison, depth of coverage for TLR15 lies well within the high-density cluster of coding loci in the genome.
Interpretation Since depth of coverage for TLR15 is comparable to the vast majority of coding sequences in the genome, it is highly likely that TLR15 has not undergone duplication in the Eudyptes lineage. This, along with the presence of homozygous mutations lack of triallelic sites, implies that TLR15 is a unitary pseudogene in the Eudyptes penguins. Supplementary Table S1. Results from site models comparisons (M2/M2a and M7/M8) in the codeml program in PAML. Likelihood ratio tests were performed by calculating double the difference in log likelihood values between the alternative model (M2a or M8) and the null model (M2 or M7). P values were obtained from the chi-squared distribution with 2 degrees of freedom (calculated in each case as the difference between the numbers of parameters included in each model). Comparisons with P values < 0.05 were considered to be statistically significant, and are highlighted in green. Note that only the non-gene-converted regions of the TLR1/2 family were included in the analysis.    Table S3: Analysis of positively selected sites in penguin TLRs using different methods. Alignments used for PAML analysis were also used for MEME and FUBAR analysis and results of MEME and FUBAR analyses are given here with reference to sites that were found to be positively selected in PAML analysis. The significance threshold for MEME was taken to be the default P ≤ 0.1 and the significance threshold for FUBAR was taken to be the default posterior probability > 0.9. Significant sites in MEME and FUBAR analysis are shaded green.  , et al. (2020) and unpublished data were used for the TLR15 population-level analysis. Since these sources of data used different seqeuencing technologies and assembly pipelines, only data from Vianna et al. were used for the TLR15 analysis, rather than a combined dataset. All data used in the analysis are available and details of accession numbers can be found in the 'Data Availability' section of the Main Text.

TLR
Supplementary Material -Identification of putative loss-of-function mutations in Eudyptes TLR15.
Including Supplementary Tables S5-S7 and Supplementary Figure S3 Alignment of intact Eudyptes TLR15 haplotypes with other birds TLR15 protein sequences were downloaded from Ensembl or NCBI (chicken, ENSGALP00000013260; northern fulmar, XP_009585200.1, misannotated as TLR2; emu, ENSDNVP00000009442; blue tit, ENSCCEP00000009931; collared flycatcher, ENSFALP00000015961; helmeted guineafowl, ENSNMEP00000012998). These were aligned to TLR15 sequences from other penguins (n=11) and intact Eudyptes haplotypes (n=45, including the consensus TLR15 used in functional analysis). Next, residues were identified which are well conserved among penguins and other birds, but that are distinct in Eudyptes TLR15. These sites are presented in Table S5.  Table S5. Amino acid and nucleotide positions of positions which are conserved among penguins and other birds, but are distinct in the Eudyptes. Red text indicates that the position is highly conserved across the whole of vertebrates (see Figure S3). *The extra intron-spanning methionine at the start of chicken TLR15 was removed prior to alignment.
Next, the variant positions were plotted against the conservation scores of TLR15 across 77 vertebrate genomes (data obtained from UCSC; Figure S3). Amino acid sites 161, 736 and 787 are highly conserved across vertebrates, but are different in Eudyptes, which could be indicative of a change in function. The mechanism of TLR15 activation has not been elucidated, so it is unclear which of these polymorphisms is the loss-of-function mutation, or whether multiple mutations resulted in the phenotype. Figure S3. Per-nucleotide conservation scores for TLR15 in chicken compared to 77 other vertebrate genomes (data obtained from UCSC). Sites that were identified as being distinct in Eudyptes compared to the rest of penguins are highlighted in red.
Next, variants found in the Eudyptes were used to replace equivalent residues in the chicken TLR15 sequence, and the functional consequences predicted using Provean and SIFT. Provean analysis predicted L161P (chicken residue 160) and L683S (chicken residue 674) to be deleterious changes (Table S6), while SIFT also predicted L683S to be a deleterious change (Table S7). SIFT also predicted deleterious effects for T56M (chicken residue 55) and E68G (chicken residue 67), but since so few sequences were included as comparators for these sites, the prediction is of low confidence.  Table S6. Provean predictions for changes in TLR15 protein function. The chicken TLR15 sequence was modified with the indicated changes (sites are equivalent to the sites described in Table S5 for the Eudyptes penguins). Table S7. SIFT predictions for changes in TLR15 protein function. As above, the chicken TLR15 sequence was modified with the indicated changes. Sites where SIFT indicated low confidence in predictions are indicated in 'Notes'.
In summary, analysis of nucleotide conservation scores indicates that three non-synonymous variants in the Eudyptes (amino acid sites 161, 736 and 787) are otherwise highly conserved across vertebrates. This could be evidence of putative change in function at these sites. Moreover, Provean analysis predicts that